Truecluster matching Truecluster matching

نویسنده

  • Jens Oehlschlägel
چکیده

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a χ-transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Truecluster matching

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the cr...

متن کامل

Truecluster: scalable statistical clustering with model selection

Data based classification is fundamental to most branches of science. Despite of progress in statistical computing and predictive modelling, cluster analysis until today lacks model selection robustness and scalability to large datasets. We consider the important problem of deciding about the optimal number of clusters given an arbitrary definition of space and clusteriness. We show how to cons...

متن کامل

Truecluster: robust scalable clustering with model selection

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection, robustness, and scalability to large datasets. We consider the important problem of deciding on the optimal number of clusters, given an arbitrary definitio...

متن کامل

Matching Integral Graphs of Small Order

In this paper, we study matching integral graphs of small order. A graph is called matching integral if the zeros of its matching polynomial are all integers. Matching integral graphs were first studied by Akbari, Khalashi, etc. They characterized all traceable graphs which are matching integral. They studied matching integral regular graphs. Furthermore, it has been shown that there is no matc...

متن کامل

Fast Least Square Matching

Least square matching (LSM) is one of the most accurate image matching methods in photogrammetry and remote sensing. The main disadvantage of the LSM is its high computational complexity due to large size of observation equations. To address this problem, in this paper a novel method, called fast least square matching (FLSM) is being presented. The main idea of the proposed FLSM is decreasing t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007